Skip to content

Make CREATE SCHEMA migration DDL concurrent-safe against pg_namespace race#282

Merged
jeremydmiller merged 1 commit into
masterfrom
fix/marten-4445-create-schema-if-not-exists
May 18, 2026
Merged

Make CREATE SCHEMA migration DDL concurrent-safe against pg_namespace race#282
jeremydmiller merged 1 commit into
masterfrom
fix/marten-4445-create-schema-if-not-exists

Conversation

@jeremydmiller
Copy link
Copy Markdown
Member

Summary

  • PostgresqlMigrator.WriteSql emitted IF NOT EXISTS(information_schema.schemata) THEN EXECUTE 'CREATE SCHEMA …', which is not atomic against concurrent sessions: both sessions can pass the existence check, then race on the insert into pg_namespace and one loses with 23505 duplicate key value violates unique constraint pg_namespace_nspname_index. PostgreSQL's own CREATE SCHEMA IF NOT EXISTS has the same window (see e.g. pg-hackers thread).
  • Wrap the create in a plpgsql sub-block that catches duplicate_schema (42P06) and unique_violation (23505) specifically, so the losing session treats the race as benign. Any other error still propagates.
  • Tracked downstream as Marten flake JasperFx/marten#4445 — three full_text_index tests fail intermittently against the existing master with the exact 23505 symptom.

Why this minimal form

The published DDL is wrapped in an outer DO $$ BEGIN … END $$; (one block per migration); each schema gets a nested BEGIN … EXCEPTION … END; sub-block so the EXCEPTION clause only suppresses the race for the current CREATE SCHEMA, not for the rest of the migration's DDL.

Test plan

  • dotnet build src/Weasel.Postgresql/Weasel.Postgresql.csproj succeeds
  • Built a local 9.0.0-alpha.3-fix4445 nupkg, pointed Marten at it, ran DocumentDbTests.Indexes.full_text_index (31 tests, 5 currently [Fact(Skip = "…#4445")]) — unskipping all 5 and running 3× in a row with DISABLE_TEST_PARALLELIZATION=true is now 31/31 passed deterministically (was Failed: 3+/31 intermittently)
  • Run existing Weasel.Postgresql.Tests suite (the Marten-side validation is the strongest signal because Weasel has no test asserting on the DDL emission text)

🤖 Generated with Claude Code

The migrator's WriteSql block emitted "IF NOT EXISTS(information_schema.schemata)
THEN EXECUTE 'CREATE SCHEMA …'", which is not atomic against concurrent sessions:
two sessions can both pass the existence check, then race on the insert into
pg_namespace and one loses with "23505 duplicate key value violates unique
constraint pg_namespace_nspname_index". PostgreSQL's own "CREATE SCHEMA IF NOT
EXISTS" has the same window. Wrap the create in a plpgsql sub-block that
catches duplicate_schema (42P06) and unique_violation (23505) specifically, so
the losing session treats the race as benign; any other error still propagates.

Tracked downstream as Marten flake: JasperFx/marten#4445.
@jeremydmiller jeremydmiller merged commit 6b6d296 into master May 18, 2026
21 checks passed
@jeremydmiller jeremydmiller deleted the fix/marten-4445-create-schema-if-not-exists branch May 18, 2026 00:05
jeremydmiller added a commit to JasperFx/marten that referenced this pull request May 18, 2026
…tgresql

Weasel 9.0.0-alpha.5 ships in lockstep (Weasel.Postgresql ⇒ this PR's
existing bump, Weasel.EntityFrameworkCore ⇒ this commit). Picks up the
CREATE SCHEMA race fix from JasperFx/weasel#282 needed by the un-skipped
full_text_index tests on this branch.
jeremydmiller added a commit to JasperFx/marten that referenced this pull request May 18, 2026
…ma (#4446)

* [#4445] Un-skip full_text_index race-flake tests; isolate shared schema

Pairs with weasel#282, which catches the duplicate_schema / unique_violation
that two concurrent sessions race on when both emit "CREATE SCHEMA <name>".
With that DDL fix in Weasel.Postgresql 9.0.0-alpha.5, the FTI fixture passes
deterministically as a suite (was 3+ tests failing intermittently with
"23505 pg_namespace_nspname_index").

The "fulltext"-schema test additionally has a data-accumulation problem
independent of the race: DocumentStore.For() with a custom DatabaseSchemaName
bypasses OneOffConfigurationsContext's cleanAll step (which only drops the
fixture's own schema). Drop the "fulltext" schema in a pre-flight outside the
#region so the docs sample stays clean.

* Bump Weasel.EntityFrameworkCore to 9.0.0-alpha.5 alongside Weasel.Postgresql

Weasel 9.0.0-alpha.5 ships in lockstep (Weasel.Postgresql ⇒ this PR's
existing bump, Weasel.EntityFrameworkCore ⇒ this commit). Picks up the
CREATE SCHEMA race fix from JasperFx/weasel#282 needed by the un-skipped
full_text_index tests on this branch.
jeremydmiller added a commit that referenced this pull request May 24, 2026
closes #293) (#294)

Follow-up to #282. That fix wrapped only CREATE SCHEMA in a plpgsql block
catching 42P06/23505. The rest of the migration DDL emitted by
migration.WriteAllUpdates(...) — CREATE OR REPLACE FUNCTION mt_immutable_*,
CREATE TABLE IF NOT EXISTS, etc. — stayed unguarded, so concurrent lazy
EnsureStorageExists against the same database could fail with
"XX000: tuple concurrently updated" (two backends updating the same
pg_proc / catalog row at once). Seen as a Marten conjoined multi-tenant CI
flake (JasperFx/marten#4552).

The DDL Weasel emits is idempotent (IF NOT EXISTS / CREATE OR REPLACE) and
executeDelta runs each statement on the bare connection (autocommit, no
ambient transaction), so a statement that lost a catalog race is safe to
re-run from a clean slate. Wrap each cmd.ExecuteNonQueryAsync in a bounded
retry (3 attempts, jittered sub-100ms backoff) keyed on the transient
SQLSTATEs:

  - 40001 serialization_failure
  - 40P01 deadlock_detected
  - XX000 internal_error ONLY when the message is "tuple concurrently
    updated" (XX000 is a catch-all, so the message guard avoids
    blanket-retrying unrelated internal errors)

The retry sits inside the existing try/catch, so after exhausting attempts
the established logger.OnFailure / rethrow path is unchanged. Kept
PG-specific (these SQLSTATEs are PostgreSQL's; executeDelta is a per-provider
override).

The classification is extracted as a pure internal predicate
IsTransientCatalogConcurrency(sqlState, messageText) and covered by 12 unit
assertions (the race itself isn't deterministically testable). Happy-path
migration/schema integration tests pass unchanged (21/22, 1 pre-existing
skip). Downstream Marten#4552 stabilizes once it consumes a Weasel build
with this.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant